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Abstract. Fractional imputation (FI) is a relatively new method of im¬ 
putation for handling item nonresponse in survey sampling. In FI, sev¬ 
eral imputed values with their fractional weights are created for each 
missing item. Each fractional weight represents the conditional prob¬ 
ability of the imputed value given the observed data, and the param¬ 
eters in the conditional probabilities are often computed by an itera¬ 
tive method such as EM algorithm. The underlying model for FI can 
be fully parametric, semiparametric, or nonparametric, depending on 
plausibility of assumptions and the data structure. 

In this paper, we give an overview of FI, introduce key ideas and 
methods to readers who are new to the FI literature, and highlight some 
new development. We also provide guidance on practical implementa¬ 
tion of FI and valid inferential tools after imputation. We demonstrate 
the empirical performance of FI with respect to multiple imputation 
using a pseudo finite population generated from a sample in Monthly 
Retail Trade Survey in US Census Bureau. 

Key words and phrases: Item nonresponse. Missing at random, Monte 
Carlo EM, Multiple imputation. Synthetic imputation. 


1. INTRODUCTION 

In survey sampling, it is a common practice to collect data on a large num¬ 
ber of items. Even when a sampled unit responds to the survey, this unit may 
not respond to some items. In this scenario, imputation can be used to create a 
complete data set by filling in missing values with plausible values to facilitate 
data analyses. The goal of imputation is three-fold: First, by providing complete 
data, subsequent analyses are easy to implement and can achieve consistency 
among different users. Second, imputation reduces the selection bias associated 
with only using the respondent set, which may not necessarily represent the orig¬ 
inal sample. Third, the imputed data can incorporate extra information so that 
the resulting analyses are statistically efficient and coherent. Combining informa¬ 
tion from several surveys or creating synthetic data from planned missingness are 
cases in point (Schenker and Raghunathan 2007). 

When the imputed data set is released to the public, it should meet the goal 
of multiple uses both for planned and unplanned parameters (Haziza, 2009). 

Room 4 . 37 A, HSPH2, 655 Huntington Ave, Boston, MA 02115 (e-mail: 
shuyang@hsph.harvard.com). 1208 Snedecor Hall, Iowa State University, Ames, 
lA 50011 (e-mail: jkim@iastate.edu). 

1 

imsart-sts ver. 2014/10/16 file: paper_review_FI_sts.tex date: August 28, 2015 



2 


S. YANG AND J. K. KIM 


In a typical survey situation, the imputers may know some of the parameters 
of interest at the time of imputation, but hardly know the full set of possible 
parameters to be estimated from the data. Single imputation, such as hot deck 
imputation, regression imputation and stochastic regression imputation, replaces 
each of the missing data with one plausible value. Although single imputation has 
been widely used, one drawback is that it does not take into account of the full 
uncertainty of missing data and often falls short of multiple-purpose estimation. 
Multiple imputation (MI) has been proposed by Rubin (1976) to replace each of 
missing data with multiple plausible values to reflect the full uncertainty in the 
prediction of missing data. Several authors (Rubin 1987; Little and Rubin 2002; 
Schafer 1997) have promoted MI as a standard approach for general-purpose 
estimation under item nonresponse in survey sampling. Although the variance 
estimation formula of Rubin (1987) is simple and easy to apply, it is not always 
consistent (Fay 1992; Wang and Robins 1998; Kim et al. 2006). For using the MI 
variance estimation formula, the congeniality condition of Meng (1994) needs to 
be met, which can be restrictive for general-purpose inference. For example, Kim 
(2011) pointed out that a MI procedure that is congenial for mean estimation is 
not necessarily congenial for proportion estimation. 

Fractional imputation (FI) is another effective imputation tool for general- 
purpose estimation with its advantage of not requiring the congeniality condition. 
FI was originally proposed by Kalton and Kish (1984) to reduce the variance of 
single imputation methods by replacing each missing value with several plausi¬ 
ble values at differentiable probabilities reflected through fractional weights. Fay 
(1996), Kim and Fuller (2004), Fuller and Kim (2005), Durrant (2005), Durrant 
and Skinner (2006) discussed FI as a nonparametric imputation method for de¬ 
scriptive parameters of interest in survey sampling. Kim (2011) and Kim and 
Yang (2014) presented FI under fully parametric model assumptions. 

More generally, FI can also serve as a computational tool for implementing 
the expectation step (E-step) in the EM algorithm (Wei and Tanner 1990; Kim 
2011). When the conditional expectation in the E-step is not available in a closed 
form, parametric FI of Kim (2011) simplihes computation by drawing on the 
importance sampling idea. Through fractional weights, FI can reduce the bur¬ 
den of iterative computation, such as Markov Chain Monte Carlo, for evaluating 
the conditional expectation associated with missing data. Kim and Hong (2012) 
extended parametric FI to a more general class of incomplete data, including 
measurement error models. 

Despite these advantages, FI in applied research has not been widely used due 
to lack of good information that provides researchers with comprehensive under¬ 
standing of this approach. The goal of this paper is to bring more attention to 
FI by reviewing existing research on FI, introducing key ideas and methods, and 
highlighting some new development, mainly in the context of survey sampling. 
This paper also provides guidance on practical implementations and applications 
of FI. 

This paper is organized as follows. Section 2 provides the basic setup and Sec¬ 
tion 3 introduces FI under parametric model assumptions. Section 4 discusses 
a nonparametric approach to FI, specially in the context of hot deck imputa¬ 
tion. Section 5 introduces synthetic data imputation using FI in the context of 
two-phase sampling and statistical matching. Section 6 deals with practical con- 
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siderations and variations of FI, including imputation sizes, choices of proposal 
distributions and doubly robust FI. Section 7 compares FI with MI in terms of 
efficiency of the point estimator and the variance estimator. Section 8 demon¬ 
strates a simulation study based on an actual data set. A discussion concludes 
this paper in Section 9. 


2. BASIC SETUP 

Consider a ffnite population of N units identified by a set of indices U = 
{1, 2, • • • , N} with N known. The p-dimensional study variable m = {yn , • • • ,yip), 
associated with each unit i in the population, is subject to missingness. We assume 
that the finite population at hand is a realization from an infinite population, 
called a superpopulation. In the superpopulation model, we often postulate a 
parametric distribution, f{y,6), with the parameter 9 € Q. We can express the 
density for the joint distribution of y as 


( 2 - 1 ) f{y; 6 ) = fi{yi]0i)f2{y2 \ yi; O2) ■ ■ ■ fp{yp \yi,--- ,yp-i-,Gp) 


where 9k is the parameter in the conditional distribution of yk given yi, - ■ ■ , yk-i- 
Now let A denote the set of indices for units in a sample selected by a probability 
sampling mechanism. Each unit is associated with a sampling weight, the inverse 
of the probability of being selected to the sample, denoted by Wi. 

We are interested in estimating 77 , defined as a (unique) solution to the popu¬ 
lation estimating equation ^(^5 m) — 0 - I^or example, a population mean of 

y can be obtained by letting U{r];yi) = rj — yi, a population proportion of y less 
than a threshold c can be obtained by specifying U{rf,yi) = y — where 

I is an indicator function, a population median of y can be obtained by choos¬ 
ing U{r];yi) = 0.5 — and so on. Under complete response, a consistent 

estimator of y is obtained by solving 

( 2 . 2 ) '^WiU{y;yi) =0. 

ieA 

Godambe and Thompson (1986), Binder and Patak (1994) and Rao, Yung, and 
Hidiroglou (2002) have done rigorous investigations on the estimator obtained 
from ( 2 . 2 ) under complex sampling. 

In the presence of missing data, first consider decomposing y, = (yofes,u Vmis,i)^ 
where yobs,i and ymis,i are the observed and missing part of y*, respectively. We 
assume that the response mechanism is missing at random (MAR) in the sense 
of Rubin (1976). That is, the probability of nonresponse does not depend on the 
missing value itself. Under MAR, a consistent estimator of y can be obtained 
by solving the conditional estimating equation, given the observed data yobs = 
i.yobs,li • • ■ ) yobs,n)} 

(2.3) ^ WiE{U{y; yi) \ yobs,i} = 0, 

i&A 


where the above conditional expectation is taken with respect to the prediction 
model (also called the imputation model). 


(2.4) 


f{y 


mis,i 


Uobs^i: — 


f {yobs,i-) Umis^i') 

f f {Uobs^i^ Umis^i] ^)dy n 
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Table 1 

Comparison of two approaches of inference with missing data 



Bayesian 

Frequentist 

Model 

Posterior distribution 

Prediction model 


/(latent, 6 \ Obs.) 

/(latent | Obs.,0) 

Learning algorithm 
Prediction 
Parameter update 

Data augmentation 
Imputation(I)-step 
Posterior (P)-step 

EM algorithm 
Expectation(E)-step 
Maximization(M)-step 

Imputation 

Multiple imputation 

Fractional imputation 

Variance estimation 

Rubin’s formula 

Linearization 
or replication 


which depends on the unknown parameter 6 . Imputation is thus a computational 
tool for computing the conditional expectation in (2.3) for arbitrary choices of 
the estimating function U{r]]y). The resulting conditional expectation using im¬ 
putation can be called the imputed estimating function. 

Table 1 presents a summary of Bayesian and frequentist approaches of sta¬ 
tistical inference with missing data. In the Bayesian approach, 0 is treated as a 
random variable and the reference distribution is the joint distribution of 6 and 
the latent (missing) data, given the observed data. On the other hand, in the 
frequentist approach, 0 is treated as fixed and the reference distribution is the 
conditional distribution of the latent data, conditional on the observed data, for 
a given parameter 0. The learning algorithm, that is, the algorithm for updat¬ 
ing information for parameters from observed data, for the Bayesian approach 
is data augmentation (Tanner and Wong 1987), while the learning algorithm for 
the frequentist approach is usually the EM algorithm. 

MI is a Bayesian imputation method and the imputed estimating function is 
computed with respect to the posterior predictive distribution, 

f {ymis,i I Vobs) = J fiy mis,i I yobs,i^^)p{^ I yobs)d^ 1 

which is the average of the predictive distribution f{ymis,i \ yobs,fid) over the 
posterior distribution of 9. On the other hand, in the frequentist approach, the 
conditional expectation in (2.3) is taken with respect to the prediction model 

(2.4) evaluated at 9 = 9, a consistent estimator of 9. For example, one can use 
the pseudo MLE 9 ot 9 obtained by solving the pseudo mean score equation (Louis 
1982; Pfeffermann et al. 1998), 

(2.5) S{9) = ^ WiE{S{9-, yi)\yi,obs; 9} = 0, 

ieA 

where S{9;yi) = dlogf{yi]9)/d9. 

While the Bayesian approach to imputation, especially in the context of MI, is 
well studied in the literature, the frequentist approach to imputation is somewhat 
sparse. FI has been proposed to fill in this important gap. In FI, the conditional 
expectation in (2.3) is computed by a weighted mean of the imputed estimating 
functions 

M 

( 2 - 6 ) E{U{TT,yi) \ yobs,i} = '^w*jU{r]-,yobs,iiy*rl{i^i)- 

i=i 
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where for j = 1,..., M, are M imputed values for ymis,i (if yi is completely 

observed, ymili = ymis,i)i w*j are the fractional weights that satisfies w*j > 0, 
= 1 and 

M 

^ TUi ^ w*jS{9-, yobs,i, y*J,lli) = 0. 

isA j=l 

Once the FI data are constructed, the FI estimator of rj is obtained by solving 

M 

(2-7) yobs,i, y*J:lli) = o. 

ieA j=l 

In general, the FI method augments the original data set as 
(2.8) Sfi = {wi,yi) + (1 - <5*) {wiw*j,y*j) ;j = l,...,M,ieA}, 

where 6 i is the indicator of full response for y^, and y*^ = {yobs,i,y^l i)- If (2-6) 
holds for an arbitrary U function, the resulting estimator is approximately unbi¬ 
ased for a fairly large class of parameters, which makes the imputation attractive 
for general-purpose estimation. Kim (2011) used the importance sampling tech¬ 
nique to satisfy (2.6) for general U functions, which will be presented in the next 
section. 


3. PARAMETRIC FRACTIONAL IMPUTATION 


Parametric Fractional Imputation (PFI), proposed by Kim (2011), features a 
parametric model for fractional imputations, and parameters in the imputation 
model are estimated by a computationally efficient EM algorithm. 

To compute the conditional estimating equation in (2.3) by PFI, for each miss¬ 
ing value ymis,i, generate M imputed values, denoted by • • • > 

a proposal distribution h{ymis,i \ yobs,i)- How to choose a proposal distribution 
will be discussed in Section 6.2. Once the imputed values are generated from h{-), 
compute 

* I yobs,i'i^) 

. UP , -T’ 

^iymis,i I yobs,i) 

subject to = 1, as the fractional weights assigned to y*^ = {yobs,i, y^l,i), 

where 9 is the pseudo MLE of 9 to be determined by the EM algorithm below. 
Since above fractional weight is the same as w*j = w*j{9), 

where 


(3.1) 


w*j{9) oc 


f{yobs,i, ymil/i 

^(.ymiLi I yobs,i) 


which only requires the knowledge of the joint distribution f{y, 9) and the pro¬ 
posal distribution h. 

The pseudo MLE of 9 can be computed by solving the imputed mean score 
equation, 


M 

(3.2) ^u('^)‘S’(^; yobs,i, 2/m£,i) = 0. 

ieA j=l 
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To solve (3.2), we can either use the Newton method or the following EM algo¬ 
rithm: 

I-step. For each missing value ymis,i, M imputed values are generated from a 
proposal distribution h{ymis,i \ yobs,i)- 

W-step. Using the current value of the parameter estimates compute the 
fractional weights as oc fiyobs,i,y*r!ili,0{t))/Hy*Jdli I yobs,i), subject 

to Ei=i = 1- 

M-step. Update the parameter d{t+i) by solving the imputed score equation, 

M 

ieA j=l 


where?/*- = {yobs,i,ymil i) andS(0;?/) = dlogf{y;9)/d6 is the score function 
of 6. 

Iteration. Set t = t+1 and go to the W-step. Stop if 0(?+i) meets the convergence 
criterion. 

Here, the I-step is the imputation step, the W-step is the weighting step, and 
the M-step is the maximization step. The I- and W-steps can be combined 
to implement the E-step of the EM algorithm. Unlike the Monte Carlo EM 
(MCEM) method, imputed values are not changed for each EM iteration - only 
the fractional weights are changed. Thus, the FI method has computational ad¬ 
vantages over the MCEM method. Convergence is achieved because the imputed 
values are not changed. Kim (2011) showed that given the M imputed values, 
2/mis i) • • • > Vmish ^be Sequence of estimators {0(o), 0{i), ■ ■ •} from the W-and M- 
steps converges to a stationary point 9*^ for hxed M. The stationary point 9]^ 
converges to the pseudo MLE of 0 as M —>■ oo. The resulting weight w*j after 

convergence is the fractional weight assigned to y*j = {yobs,i, y’^l i) ■ We may add 
an additinal step to monitor the distribution of the fractional weights so that no 
extremely large fractional weights dominate the weights. 

Once the fractional imputed data is constructed from the above steps, it can 
be used to estimate other parameters of interest. That is, we can use (2.7) to 
estimate rj from the FI data set. 

We now consider a bivariate missing data example to illustrate the use of the 
EM algorithm in FI. 

Example 1. Suppose a probability sample eonsists ofn units of Zi = (xj, yu, y 2 i) 
with sampling weight Wi, where Xi is always observed and yi = {yu, y 2 i) is subject 
to missingness. Let An, Aiq, Aqi, and ^oo be the partition of the sample based 
on the missing pattern, where subscript 1/0 in the i-th position denote that the 
i-th y item is observed/mis sing, respectively. For example, ^lo is the set of the 
sample with yn observed and yi 2 missing. 

The conditional expectation in (2.3) involves evaluating the conditional distri¬ 
bution of ymis,i given the observed data Xi and yobs,i for each missing pattern. 
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which is then decomposed into 


'Y^Wi¥.{U{ri]Zi)\xi,yobs,i} = WiU{'q\Xi,yii,yi2)+'^ WiE{U{iT, Xi,Yii,Yi2) 

i&A isAii isAoo 

I E WiE{U{'q]Xi,Yii,yi2) \ Xi,yi2}+ E WiE{U{r]]Xi,yii,Yi2) \ Xi,yii}. 

isAoi isAio 

Suppose the joint distribution in ( 2 . 1 ) is 


(3.3) f{x,yi,y 2 ; 6 ) = fx{x; 6 o)fi{yi \ x; 9 i)f 2 {y 2 \ a:,yi; 6 » 2 ). 


From the full respondent sample in An, obtain 0i(o) and 02 ( 0 ), which are initial 
parameter estimates for 9i and 02 ■ 

In the I-step, for each missing value ymis,i, generate M imputed values from 
h{ymis,i I Xi,yobs,i) = f{ymis,i I Xi,yobsY9(o)), where 


(3.4) 


f {ymis,i I Xi, yobs,i', ^(0)) 


f 2 {y 2 i I x,yii;02(o)) 

^ f{yii\x,y2i;9^o)) 

f{yii,y 2 i\ Xi;9(^0)) 


if i G Aio 
if i e ^01 
if i G ^00 


and 


(3.5) f{yii\ Xi,y2i;9(o)) = 


fijyii I Xi;9^o))f2{y2i \ Xi,yu] 02(0)) 
f fiiyii I Xi] 0 i(^o))f 2 {y 2 i I Xi,yu] 02 (o))dyii 


Note that the marginal distribution of x, fx{x; 0o), is not used in (3.5). Except for 
some special cases such as when both fi and f 2 are normal distributions, the con¬ 
ditional distribution in (3.5) is not in a known form. Thus, some computational 
tools such as Metropolis-Hasting (Hastings 1970) or SIR (Sampling Importance 
Resampling, Smith and Gelfand 1992) are needed to generate samples from (3.5) 
for i G Aqi. For example, the SIR consists of the following steps: 

1. Generate B (say B = lOOj Monte Carlo samples, denoted by yll^\ • • • , yli^\ 
from fi{yu \ Xi-, 0 ^o)). 

2. Among the B samples obtained from Step 1, select one sample with the 

selection probability proportional to f 2(921 \ Xi,yll^'^] 92 [o)), where is 

the k-th sample from Step 1 (k = 1, - ■ ■ ,B). 

3. Repeat Step 1 and Step 2 independently M times to obtain M imputed 
values. 

Once we obtain M imputed values ofyu, we can use 


h(yii I Xi,y 2 i) oc fi{yii \ xu 0 i(o))/ 2 (y 2 i | Xi,yii;02(o)) 

as the proposal density in (3.j). Since = 1? '*^6 do not need to compute 

the normalizing constant in (3.5). For i G ^10, M imputed values of y 2 i are 
generated from f 2 {y 2 i \ Xi,yu] 02 {o))- Fori G ^( 00 ); FI imputed values ofyu are 
generated from fi(yii \ Xi;0i(o)) then M imputed values of y 2 i are generated 

from f 2 (y 2 i I Xi,yli; 02 (p)). 
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In the W-step, the fraetional weights are computed by 




1 , where = yu if yu is observed and = y 2 i if y 2 i is 


observed. 


The above example covers a broad range of applications in the missing data 
literature, such as missing covariate problems, measurement error models, gener¬ 
alized linear mixed models, and so on. Yang and Kim (2014) considered regression 
analyses with missing covariates in survey data using FI, where in the current 
notation, f{y 2 \ x,yi) is a regression model with y 2 and x fully observed and 
yi subject to missingness. In generalized linear mixed models, f{y 2 \ x,yi) is a 
generalized linear mixed model where yi is the latent random effect. See Yang, 
Kim, and Zhu (2013) for using FI to estimate parameters in the generalized linear 
mixed models. 

For variance estimation, note that the imputed estimator fjpj obtained from 
the imputed estimating equation (2.7) depends on 6 obtained from (3.2). To 
reflect this dependence, we can write rjpj = 'fjpi{0). To account for the sampling 
variability of 9 in the imputed estimator ijpi, either the linearization method or 
replication methods can be used. In the linearization method, the imputation 
model is needed in order to compute partial derivatives of the score functions. To 
avoid disclosing the imputation model, replication methods are often preferred 
(Rao and Shao 1992). To implement the replication variance estimation in FI, we 
first obtain the k-ih replicate pseudo MLE 01^1 of 9 by solving 


M 



(3.6) 


i^A j=l 


where is the fe-th replication weight and w*j{9) is defined in (3.1). To obtain 
§[i^] from (3.6), either EM algorithm or the one-step Newton method can be used. 


EM algorithm can be implemented similarly as before. For the one-step Newton 
method, we have 



where 
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with S{9]y) = dS{9]y)/d6^ and = BB^. Once is obtained, we obtain 
the /c-th replicate of fj by solving 

M 

Y1 ^ yij^ = 0 

i^A j=l 

for y, where 

4. NONPARAMETRIC FRACTIONAL IMPUTATION 


4.1 Fractional Hot Deck Imputation 


Hot deck imputation uses observed responses from the sample as imputed val¬ 
ues. The unit with a missing value is called the recipient and the unit providing 
the value for the imputation is called the donor. Durrant (2009), Haziza (2009) 
and Andridge and Little (2010) provided comprehensive overviews of hot deck 
imputation in survey sampling. The attractive features of hot deck imputation 
include the following. First, unlike model-based imputation methods that gener¬ 
ate artificial imputed values, in hot deck imputation, only plausible values can 
be imputed, and therefore distributional properties of the data are preserved. 
For example, imputed values for categorical variables will also be categorical, as 
observed from the respondents. Second, compared to fully parametric methods, 
hot deck imputation makes less or no distributional assumptions and therefore is 
more robust. For these reasons, hot deck imputation is a widely used imputation 
method, especially in household surveys. 

Fractional hot deck imputation (FHDI) combines the ideas of FI and hot deck 
imputation. It is efficient (due to FI), and it inherits the aforementioned good 
properties of hot deck imputation. Kim and Fuller (2004), Fuller and Kim (2005), 
and Kim and Yang (2014) considered FHDI for univariate missing data. We 
now describe a multivariate FHDI procedure to deal with missing data with an 
arbitrary missing pattern (Im et al. 2015). 

We first consider categorical data. Let z = {zi,..., zk) ^oe the vector of study 
variables that take categorical values. Let Zj = {zn ,..., ZiA be the i-th realiza¬ 
tion of z. Let 5ij be the response indicator variable for Zij. That is, 5ij = 1 if 
Zij is observed and 5ij = 0 otherwise. Assume that the response mechanism is 
MAR. Based on 5i = {Sn,... ,5iK), the original observation Zj can be decom¬ 
posed into {zobs,i, Zmis,i), which are the missing and observed part of Zj, respec¬ 
tively. Let Di = be the set of all possible values of Zmis,i, 

that is, {zobs,i, actually observed value in the respondents, for 

j = 1,..., Mi, with Mi > 0. If all of Mi possible values are taken as the imputed 
values for Zmis,i, the fractional weight assigned to the j-th imputed value z^l^^ 
is 




hi) 


= 


*{k) , ' 

,zA'i) 


( 4 - 1 ) 

Ak&Di '^{^obs,i, '^rnis,H 

where vr(z) is the joint probability of z. If the joint probability is nonparametri- 
cally modeled, it is computed by 


(4.2) 


7r(z) = - 


EieA EjGDi W*jI{{Zobs,i, z*Jili) 
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where = Zmis,i and w*j = M- \ for j = 1,..., M- \ if z* is completely ob¬ 

served. To compute (4.1) and (4.2), EM algorithm by weighting (Ibrahim 1990) 
can be used, with the initial values of fractional weights being = M~ . 

Equations (4.1) and (4.2) correspond to the E-step and M-step of the EM al¬ 
gorithm, respectively. The M-step (4.2) can be changed if there is a parametric 
model for the joint probability vr(z). Eor example, if the joint probability can be 
modeled by a multinomial distribution with parameter a, say 7r(z;Q;), then the 
M-step replaces (4.2) with solving the imputed score equation of a to update the 
estimate of a. 

Eor continuous data y = {yi,..., yx), we consider a discrete approximation. 
Discretize each continuous variable by dividing its range into a small finite number 
of segments (for example, quantiles). Let Zik denote the discrete version of yik- 
Note that Zik is observed only if yik is observed. Let the support of z, denoted 
by {zi,..., zg}, which is the same as the sample support of z from the full 
respondents, specify donor cells. The joint probability of z, denoted by 7r(zg), for 
<7 = 1,..., G, can be obtained by the EM algorithm for categorical missing data 
as described above. 

As in the categorical missing data problem, let Di = ..., be the 

set of all possible values of Zmis,i- Using a finite mixture model, a nonparametric 
approximation of f{ymis,i \ yobs,i) is 

(4.3) fiymis,i I yobs,i) ~ I yobs,i)f{ymis,i I Z-^^^). 

1=1 

Each = {zobs,i, defines an imputation cell. The approximation in (4.3) 

is based on the assumption that 

(4.4) P{ymis I yobs^'P) = P{ymis \ z), 

which requires (approximate) conditional independence between ymis and yobs 
given z. Thus, we assume that the covariance structure between items are cap¬ 
tured by the discrete approximation and the within cell errors can be safely as¬ 
sumed to be independent. Once the imputation cells are formed to satisfy (4.4), 
we select rUg imputed values for ymis^i, denoted by = {yobs,i,y*Jili), for 

j = randomly from the full respondents in the same cell, with the 

selection probability proportional to the sampling weights. The final fractional 
weights assigned to is w*j = P{z^ili \ yobs,i)rn~^. 

This EHDI procedure resembles a two-phase stratified sampling (Rao 1973, 
Kim et al. 2006), where forming the imputation cells corresponds to stratifica¬ 
tion (phase one) and conducting hot deck imputation corresponds to stratified 
sampling (phase two). For more details, see Im, Kim, and Fuller (2015). 

If we select all possible donors in the same cell, the resulting FI estimator 
is fully efficient in the sense that it does not introduce additional randomness 
due to hot deck imputation. Such fractional hot deck imputation is called fully 
efficient fractional imputation (FEFI). The FEFI option is currently available at 
Proc Surveyimpute in SAS (SAS Institute Inc. 2015). 

4.2 Nonparametric Fractional Imputation Using Kernels 

In real-data applications, nonparametric methods are preferred if less is known 
about the true underlying data model. Hot deck imputation makes less or no dis- 
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tributional assumptions and therefore is more robust than fully parametric meth¬ 
ods. In what follows, we discuss an alternative way of calculating the fractional 
weights that links the FI estimator to some well-known nonparametric estimators, 
such as Nadaraya-Watson kernel regression estimator (Nadaraya 1964). 

For simplicity, suppose we have bivariate data {xi,yi) where Xi is completely 
observed and yi is subject to missing. Assume the missing data mechanism is 
MAR. Let 5i be the response indicator that takes the value one if yt is observed 
and takes zero otherwise. We are interested in estimating y, which is defined 
through E{U{r]; X, R)} = 0. Let Ar = {i G A; = 1} be the index set of respon¬ 
dents. To calculate the conditional estimating equation (2.3) nonparametrically, 
we use the following fractional imputation: for each unit i with <5j = 0, r = \Aji\ 
imputed values of y* are taken from Ar, denoted by y*^^\ ■ ■ ■ , and compute 
the Kernel-based fractional weights w*j = Kh{xi — x*^^'^)/ Ylk&An ^h{xi — 
where Kh{-) is the kernel function with bandwidth h and is the covariate 
associated with y*^^^ ■ The resulting FI estimating equation can be written as 

(4.5) J2'^i\^i^iE,Xi,yi) + {l-6i) '^h^i'n',Xi,yf^^) \ =0, 

*6A [ J&Ar ] 

where the nonparametric fractional weights measure the degrees of similarity 
based on the distance between Xi and x*^^\ The FI estimator uses U{r];xi) = 
r/;*jC/(ry; Xj, to approximate E{U{r]; Xi,yi) \ Xi} nonparametrically. 

For fixed rj, U{r]]Xi) is often called the Nadaraya-Watson kernel regression esti¬ 
mator of E{U{r];xi,yi) \ Xi} in the nonparametric estimation framework. Note 
that this FI estimator does not rely on any parametric model assumptions and so 
is nonparametric; however it is not assumption free because it makes an implicit 
assumption of the continuity of E{U{r]-,x,y) \ Xi} through the choice of kernels 
to dehne the “similarity” (Nadaraya 1964). Notably, while the convergence of 
U{r]]Xi) to E{U{rj;xi,yi) \ Xi} does not achieve the order of Op{l/y/n), the solu¬ 
tion fjFi to (4.5) satisfies fjFi — y = Op{l/^/n) under some regularity conditions, 
which was proved by Wang and Chen (2009) in the IID setup. 

Such kernel-based nonparametric fractional imputation can be directly appli¬ 
cable to complex survey sampling scenarios. More developments are expected by 
coupling FI with other nonparametric methods such as those using the nearest 
neighbor imputation method (Chen and Shao 2001; Kitamura et al. 2009; Kim 
et al. 2011) or predictive mean matching (Vink et al. 2014). 

5. SYNTHETIC DATA IMPUTATION 

Synthetic imputation is a technique of creating imputed values for the un¬ 
observed items by incorporating information from other surveys. For example, 
suppose that there are two independent surveys, called Survey 1 and Survey 2, 
and we observe Xi from Survey 1 and observe (x*, yi) from Survey 2. In this case, 
we may want to create synthetic values of y* in Survey 1 by first fitting a model 
relating y to x to the data from Survey 2 and then predicting y associated with 
X observed in Survey 1. Synthetic imputation is particularly useful when Survey 
1 is a large scale survey and item y is very expensive to measure. Schenker and 
Raghunathan (2007) reported several applications of synthetic imputation, using 
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a model-based method to estimate parameters associated with variables not ob¬ 
served in Survey 1 but observed in a much smaller Survey 2. In one application, 
both self-reported health measurements Xi and clinical measurements from phys¬ 
ical examinations y* for a small sample A 2 of individuals were observed. In the 
much larger Survey 1, only self-reported measurements, Xi were observed. Only 
the imputed or synthetic data from Survey 1 and associated survey weights were 
released to the public. 

The setup of two independent samples with common items is often called non¬ 
nested two-phase sampling. Two-phase sampling can be treated as a missing data 
problem, where the missingness is planned and the response probability is known. 

5.1 Fractional Imputation for Two-phase Sampling 

In two-phase sampling, suppose we observe Xi in the first-phase sample and 
observe {xi,yi) in the second-phase sample, where the second-phase sample is 
not necessarily nested within the first-phase sample. Let Ai and wn be the set of 
indices and the set of sampling weights for the first-phase sample, respectively. 
Let A 2 and Wi 2 be the corresponding sets for the second-phase sample. Assume 
a “working” model m{xi;j3) for E{y \ Xi). For estimation of the population total 
of y, the two-phase regression estimator can be written as 

(5.1) Ytp = ^ Wiim{xi\P)+'^ Wi2{yi - m{xi] /3)}, 

i^A\ 2GA2 


where the subscript ”tp” stands for ”two-phase”, and /3 is estimated from the 
second-phase sample. The two-phase regression estimator is efficient if the work¬ 
ing model is well-specified. The first term of (5.1) is called the projection es¬ 
timator. Note that if the second term of (5.1) is equal to zero, the two-phase 
regression estimator is equivalent to the projection estimator. Some asymptotic 
properties of the two-phase estimator and variance estimation methods have been 
discussed in Kim, Navarro, and Fuller (2006), and Kim and Yu (2011a). Kim and 
Rao (2012) discussed asymptotic properties of the projection estimator under 
non-nested two-phase sampling. 

In a large scale survey, it is a common practice to produce estimates for do¬ 
mains. Creating an imputed data set for the hrst-phase sample, often called mass 
imputation, is one method for incorporating the second-phase information into 
the first-phase sample. Breidt and Fuller (1996) discussed the possibility of us¬ 
ing imputation to get improved estimates for domains. Fuller (2003) investigated 
mass imputation in the context of two-phase sampling. 

The FI procedure can be used to obtain the two-phase regression estimator in 

(5.1) and, at the same time, improve domain estimation. Note that the two-phase 
regression estimator (5.1) can be written as 

(5.2) Yfefi = X] X] 

ieAi j&A2 

where y*^^^ = m + ej, m = m{xi;P), ej = yj - yj, w*^ = Wj 2 /{YlkeA 2 ^k 2 ), 
and we assume X^ieAi ^*1 ~ The expression (5.2) implies that we 

impute all the elements in the first-phase sample, including the elements that 
also belong to the second-phase sample. The estimator (5.2) is computed using an 
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augmented data set of ni x n 2 records, where ni and n 2 are the sizes of Ai and A 2 , 
respectively, and the (i, j)-th record has an (imputed) observation = yi + ej 
with weight Wiiw*^. That is, for each unit i £ Ai, we impute n 2 values of y*^^^ 
with fractional weight w*j. The method in (5.2) imputes all the elements in A 2 
and is called fully efficient fractional imputation (FEFI) method, according to 
Fuller and Kim (2005). The FEFI estimator is algebraically equivalent to the 
two-phase regression estimator of the population total of y, and can also provide 
consistent estimates for other parameters such as population quantiles. 

If it is desirable to limit the number of imputations to a small value m [m < 
77 - 2 ), FI using the regression weighting method in Fuller and Kim (2005) can be 
adopted. We first select m values of y*^^\ denoted by y**^^\ ■ ■ ■ ,y*i*^'^\ among 
the set of n 2 imputed values £ A 2 } using an efficient sampling method. 

The fractional weights w*j assigned to the selected y**^^'^ are determined so that 

m 

( 5 . 3 ) = E < 

i=i 16^2 

holds for each i £ Ai. The fractional weight satisfying (5.3) can be computed using 
the regression weighting method or the empirical likelihood method, see section 

6.1 for details. The resulting FI data y**^^'^ with weights wnw^j are constructed 
with m X m records, which integrate available information from two phases. 
Replication variance estimation with FI, similar to Fuller and Kim (2005), can 
be developed. See Section 8.7 of Kim and Shao (2013). 

5.2 Fractional Imputation for Statistical Matching 

Statistical matching is used to integrate two or more data sets when informa¬ 
tion available for matching records for individual participants across data sets is 
incomplete. Statistical matching can be viewed as a missing data problem where 
a researcher wants to perform a joint analysis of variables not jointly observed. 
Statistical matching techniques can be used to construct fully augmented data 
hies to enable statistically valid data analysis. 

Table 2 

A Simple Data Structure for Matching 

K Ti 
Sample A o o 
Sample Bo o 


To simplify the setup, suppose that there are two surveys. Survey A and Survey 
B, each containing a random sample with partial information about the popula¬ 
tion. Suppose that we observe x and yi from the Survey A sample and observe x 
and y 2 from the Survey B sample. Table 2 illustrates a simple data structure for 
matching. 

Without loss of generalizability, consider imputing yi in Survey B, since im¬ 
puting 7/2 in Survey A is symmetric. Under this setup, we can use FI to generate 
yi from the conditional distribution of yi given the observations. That is, we 
generate yi from 

( 5 . 4 ) / (7/1 I X, 7/2) oc / (7/2 I X, 7/1) / (yi I x). 
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Of note, assumptions are needed to identify the parameters in the joint model. For 
example, Kim, Berg, and Park (2015) used an instrumental variable assumption 
to identify the model. To generate yi from (5.4), the EM algorithm by FI can be 
used. For more details, see Kim, Berg, and Park (2015). 

6. FRACTIONAL IMPUTATION VARIANTS 
6.1 The Choice of M and Calibration Fractional Imputation 

The choice of the imputation size M is a matter of tradeoff between statistical 
efficiency and computation efficiency: small M may lead to large variability in 
Monte Carlo approximation; whereas large M may increase computational cost. 
The magnitude of the imputation error is usually 0{l/y/M), which can be re¬ 
duced for large M. Thus, if computational power allows, the larger M, the better. 

In survey practices, a large imputation size may not be desirable. Thus, instead 
of releasing to public large number of imputed values for each missing item, a 
subset of initial imputation values can be selected to reduce the imputation size. 
In this case, the FI procedure can be developed in three stages. The first stage, 
called Fully Efficient Fractional Imputation (FEFI), computes the pseudo MLE of 
parameters in the superpopulation model with sufficiently large imputation size 
M, say M = 1,000. The second stage is the Sampling Stage, which selects small 
m (say, m = 10) imputed values from the set of M imputed values. The third 
stage is Calibration Weighting, which involves constructing the final fractional 
weights for the m final imputed values to satisfy some calibration constraints. 
This procedure can be called Calibration FI. 

The FEFI step is the same as in the previous section. In what follows, we 
describe the last two stages in details. In the Sampling Stage, a subset of im¬ 
puted values are selected to reduce the imputation size. For each i, we have M 
imputed values = {yobs,i-,y*mil i) with their fractional weights w*y We treat 
y* = {y*j,j = 1,Mj as a weighted finite population with weight w*j and use 
an unequal probability sampling method such as probability-proportion-to-size 
(PPS) sampling to select a sample of size m, say m = 10, from y* using w*j as 
the selection probability. Let y*^,, y*^ be the m elements sampled from y*. 

The initial fractional weights for the sampled m imputed values are given by 
w*jQ = m~^. This set of fractional weights may not necessarily satisfy the imputed 
score equation 


m 



( 6 . 1 ) 


isA j=l 


where 0 is the pseudo MLE of 0 computed at the FEFI stage. It is desirable 
for the solution to the imputed score equation with small m to be equal to the 
pseudo MLE of 0, which specifies the calibration constraints. At the Calibration 
Weighting stage, the initial set of weights are modified to satisfy the constraint 
(6.1). Finding the calibrated fractional weights can be achieved by the regres¬ 
sion weighting technique, by which the fractional weights that satisfy (6.1) and 
= 1- The regression fractional weights are constructed by 


( 6 . 2 ) 
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where S*j = S{9;y*^), S* = YJf=iWijoStj, and 

m m 

i&A j=l i^A j=l 

Note that some of the fractional weights computed by (6.2) can take negative 
values. To avoid negative weights, alternative algorithms other than regression 
weighting should be used. For example, the fractional weights of the form 

<ioexp(A5*.) 

Er=i<.oexp(A5;,) 

are approximately equal to the regression fractional weights in (6.2) and are 
always positive. 

6.2 The Choice of the Proposal Distribution 

PFI is based on sampling from an importance sampling density h called the 
proposal distribution. The choice of the proposal distribution is somewhat ar¬ 
bitrary. However, with hnite samples and imputations, a well-specified proposal 
distribution may improve the performance of the imputation estimator. There are 
a number of ways to specify the proposal distribution and to assess the goodness 
of specification. 

For a planned parameter, e.g., t], the population mean of y, Kim (2011) showed 
the optimal h* that makes Monte Carlo approximation variance of y* = YlfLi '’^ijVij 
as small as possible, is given by 


h iymis,i\yobs,i) — f iymis,i\yobs,i: X 


\yi ^{yi\yobs,ii 9}\ 
^{yi\yobs,ij ^}\yobs,i: 


where 6 is the MLE of 6 . For general-purpose estimation, y is often unknown at 
the time of imputation according to Fay (1992), h{ymis,i\yobs,i) = f{ymis,i\yobs,u (^) 
is a reasonable choice in terms of statistical efficiency. For importance sampling, 
since we do not know 0 at the outset of the EM algorithm, we may want to have a 
good initial guess Oq and use h{ymis,i\^i, yobs,i) = f{ymis,i\^iiyobs,u ^o)- If we don’t 
have a good initial guess of the true value of 0, we can use a prior distribution 
7r(9) to get h(ymis,ilyobs,i) = f f(ymis,ilyobs,i; 9)'^(9)d0. 

We now discuss a special choice of the proposal distribution h, based on the 
realized values of the variables having missing values, which is akin to hot deck 
imputation. Without loss of generality, assume that y* is observed in the hrst r 
elements, y* is missing in the remaining (n —r) elements, and Xi is completely ob¬ 
served in the sample. Using the importance sampling idea, we assign a fractional 
weight to donor yj {1 < j < r) for the missing item yi (r-|-1 < f < n) by choosing 
^(Vj) — fiVj I — !)• I^ calculating the fractional weights, we approximate 
f{yj I 6 j = 1) by its empirical distribution n]i^ Ylk=i^kf (yj \ Xk), where ur is 
the number of respondents. The EM algorithm takes the following steps: 

I-step For each missing value yt, i = r + 1,... ,n, take all values in Ar = 
{yi,..., yr} as donors. 
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W-step With the current estimate of 9, denoted by compute the fractional 
weights by 


(6.3) 


, fiVj \ Xi-,9(^t)) 

^ ^ - ft I 

z2kGAn'Wkf[yj I Xk^e^t)) 


M-step Update the parameter ^(t+i) by solving the following imputed score 
equation, 

r nr 

0(t+i) : solution to ^ 5(0; Xj, y*) + ^ ^ u;*j‘^5(0; Xj, y^) = 0. 

i=l i=r+l j=l 


Iteration Set t = t+1 and go to the W-step. Stop if 0(t+i) meets the convergence 
criterion. 

The semiparametric fractional imputation (SFI) estimator of Y is 


. I r nr 

i=\ i=r-\-l j=l 

Kim and Yang (2014) showed that the resulting estimator gains robustness. It 
is less sensitive against the departure from the assumed conditional regression 
model. 

6.3 Doubly Robust Fractional Imputation 

Suppose we have bivariate data (xj,yi) where Xj is completely observed and 
Ui is subject to missing and missing data mechanism is MAR. Assume also an 
outcome regression (OR) model, given by E{yi \ Xj) = m{xi; /3o), and the response 
propensity (RP) model, given by P{5i = 1 | Xi,yi) = P{5i = 1 | Xj) = 7r{xi-,4>o)- 
Denote the set of respondents as Aji = {i,5i = 1}, where 6 i is the response 
indicator of y*. We are interested in the population total rj = Vi- Note that 
not both the OR and RP models are needed to construct consistent estimators 
of p. For example, pi = t(;im(xj;/3), with f3 being a consistent estimator of 

/3o, is consistent to p under the OR model and p 2 = Wiyi/'K{xi\ c^), with ^ 

being a consistent estimator of (j)o, is consistent to p under the RP model. 

An estimator of p is doubly robust if it is consistent if either the OR model 
or the RP model is correct, but not necessarily both. This property guards the 
estimator from possible model misspecifications. The DR estimators have been ex¬ 
tensively studied in the literature, including Robins, Rotnitzky, and Zhao (1994), 
Bang and Robins (2005), Tan (2006), Kang and Schafer (2007), Cao, Tsiatis, 
and Davidian (2009), and Kim and Haziza (2014). We now discuss a fractional 
imputation estimator that has the double robustness feature. 

For each missing y*, let y*j = yi + ij be the y-th imputed value from the 
donor j £ Ar, where y* = m(xj;/3) with /3 fitted under the OR model and 

= Uj-iTT'ixi] $)■ If = Y^i(^A'^ii each unity G Ar represents 

l/7r(xj;(^) copies of the sample. Then, the fractional weight w*j associated with 
the y-th imputed value y*j is proportional to {l/7r(xy; (p) — 1} over the donor pool 
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Afi (minus one because yj itself counts one), that is, 

Wj{l/TT{xj](l)o) - 1 } 


(6.4) 


= 


T,keA'^kSk{'i-/'^{xk-,(i)) - 1 } 

Under this weight construction, the fractional imputation estimator is given by 


(6.5) 


fjFi = 

i&A 


SiVi + (1 - <5i){^ 

i=i 


We show that the fractional imputation estimator t)fi in (6.5) is doubly robust. 
First notice that ypi is algebraically equal to 


( 6 . 6 ) 


fjFI = '^Wi 
ieA 


S' 

Mxf, / 3 ) + - Mxf, / 3 )} 

F{Xi;(j)) 

Let fjn = be the full sample estimator of of rj, then 

f 5,; 


riFI - Tin = '^Wi 




-l}{yi- m{xi-,P)}. 


This is an asymptotically unbiased estimator of zero if either the OR model or the 
RP model is correct, but not necessarily both. Kim and Haziza (2014) discussed 
efficient estimation of (/3, (p) in survey sampling. 


7. COMPARISON WITH MULTIPLE IMPUTATION 
7.1 Statistical Efficiency 

In the presence of missing data with MAR, multiple imputation (MI) is a 
popular method. It is thus of interest to compare the behavior of these two 
methods. We start from a simple setting with the complete data z being randomly 
drawn from a population whose density is f{z]9), where 0 G is an unknown 
parameter to be estimated. Suppose that m complete data sets are created by 
imputing the missing data Zmis from the posterior predictive distribution given 
the observed data Zobs f{zmis \ Zobs) = f fizmis \ Zobs',9)TT{6 \ Zobs)d0, where 
f{0 I Zobs) is the posterior distribution of 9. The MI estimator of 9, denoted by 

9mi is 

m 

9 MI = 

k=l 

where is the MLE estimator applied to the /c-th imputed data set. Rubin’s 
formula is used for variance estimation in MI, 

Vmi{9mi) = Wm + (1 + 

where Wm = m~^ Bm = {m - 1)“^ ~ dMi)'^, and is 

the variance estimator of 9 under complete response applied to the k-th imputed 
data set. 


imsart-sts ver. 2014/10/16 file: paper_review_FI_sts.tex date: August 28, 2015 









18 


S. YANG AND J. K. KIM 


Of note, Bayesian MI is a simulation-based method and thus introduce ad¬ 
ditional noise. This explains why the asymptotic variance of the MI estimator, 
given by Wang and Robins (1998), 

(7.1) VmI = Tofts + ^ ^com^rnis^com + ^ J ^obs'^^ 
is strictly larger than the asymptotic variance of the FI estimator 

(7.2) Vpi = T^fts + "bn 

where Team = E{S{e)®^}, lobs = E{Sobs{e)®^} , Trais = Team " Toft., 5(0) = 
5(Z; 9) = (91og/(Z; 6)186 is the log likelihood score if the data were completely 
observed and Sobs{G) = E{S{6) \ Zobs} is the score function of the observed 
data log likelihood , J = Imis^Zom is the fraction of missing information matrix 
(Rubin 1987, Chapter 4). This difference between (7.1) and (7.2) can be sizable 
for a small m. Furthermore, for a large m, although the MI estimator is efficient, 
the inference is inefficient since Rubin’s variance estimator of the MI estimator 
is only weakly unbiased, that is Vmi{6mi) converges in distribution instead of 
coverages in probability to Vmi- This leads to much broader confidence intervals 
and less powerful tests than a consistent variance estimator would do (Nielsen 
2003). 

For MI inference to be valid for general-purpose estimation, imputations must 
be proper according to Rubin (1987). A sufficient condition is given by Meng 
(1994). The so-called congeniality condition, imposed on both the imputation 
model and the form of subsequent complete-sample analyses, is quite restrictive 
for general-purpose estimation. Otherwise, as discussed by Fay (1992; 1996), Kott 
(1995), Binder and Sun (1996), Robins and Wang (2000), Nielsen (2003), and 
Kim et al. (2006), the MI variance estimator is not always consistent. Kim et al. 
(2011) pointed out that MI that is congenial for mean estimation is not necessarily 
congenial for proportion estimation. Yang and Kim (2015b) showed that the MI 
variance estimator can be positively or negatively biased when the method of 
moments estimator is used as the complete-sample estimator. In contrast, FI, as 
we discussed in section 4, does not require congeniality and always results in a 
consistent variance estimator for general-purpose estimation. 

7.2 Imputation under Informative Sampling 

Under informative sampling, the MAR assumption is subtle. We assume that 
the response mechanism is MAR at the population level, now referred to as 
population missing at random (PMAR), to be distinguished from the concept of 
sample missing at random (SMAR). For simplicity, assume y is a one-dimensional 
variable which is subject to missing, 5 is its response indicator, and I is the sample 
inclusion indicator. PMAR assumes that y E 5 \ x, that is, MAR holds at the 
population level, f{y \ x) = f{y \ x,6). On the other hand, SMAR assumes 
Y E 5 \ {x, I = 1), that is, MAR holds at the sample level, f{y \ x, I = 1) = f{y \ 
x,I = 1,5). The two assumptions are not testable empirically. The plausibility of 
these assumptions should be judged by subject matter experts. Often, PMAR is 
more realistic because an individual’s decision on whether or not to respond to a 
survey depends on his or her own characteristics, rather than the fact of him or 
her being in the sample or not. 
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Figure 1. A directed acyclic graph (DAG) for a setup where PMAR holds but SMAR does not 
hold. Variable U is latent in the sense that it is never observed. 

For noninformative sampling design, we have P{I = 1 \ x,y) = P{I = 1 | x), 
under which PMAR implies SMAR; however for informative sampling design, 
PMAR does not necessarily imply SMAR. In such cases, using an imputation 
model fitted to the sample data for generating imputations can result in biased 
estimation. 

FI does not require SMAR to hold besides PMAR. Under PMAR, we have 
f{y I x,(5 = 0) = f{y \ x). Let f{y \ x;/3) be a parametric model of f{y \ 
x). The parameter /3 can be consistently estimated by solving (2.5), even under 
informative sampling. Since FI generates the imputations from f{y \ x] 13), with a 
consistent estimator (3, the resulting FI estimator is approximately unbiased (Berg 
et al. 2015). Whereas, MI tends to problematic under informative sampling. By 
using an augmented model, where the imputation model is augmented to include 
sampling weights or some function of them, as f{y \ x,w), the MI point estimator 
was claimed to be approximately unbiased (Rubin 1996; Schenker et al. 2006). 
However, as pointed out by Berg, Kim, and Skinner (2015), it is not always true. 
For example, Y is conditionally independent of 5 given X as presented in Figure 
1. However, Y is not conditionally independent of 6 given X and L Augmenting 
X by including sampling weights does not solve the problem. The existence of the 
latent variable U, which is correlated with I and 5, makes SMAR unachievable. 

8. SIMULATION STUDY 

We investigated the performance of FI compared to MI by a limited simulation 
study using an artihcial finite population generated from real survey data. The 
pseudo finite population was generated from a single month of the U.S. Census 
Bureau’s Monthly Retail Trade Survey (MRTS). Each month, the MRTS surveys 
a sample of about 12,000 retail businesses with paid employees to collect data on 
sales and inventories. The MRTS is an economic indicator survey whose monthly 
estimates are inputs to the Gross Domestic Product estimates. The MRTS sample 
design is typical of business surveys, employing one-stage stratihed sampling with 
stratification based on major industry, further substratified by the estimated 
annual sales. The sample design requires higher sampling rates in strata with 
larger units than in strata with smaller units. More details about MRTS can be 
found in Mulry, Oliver, and Kaputa (2014). 

The original population hie contains 19, 601 retail businesses stratihed into 
16 strata, with a strata identiher (/i), sales (y), and inventory values (x). For 
simulation purpose, we focus on the hrst 5 strata as a hnite population, consisting 
of 7, 260 retail businesses. Figure 2 shows the scatter plot of sales and inventory 
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X Stratum 2 

E Stratum 3 

Stratum 4 
■ Stratum 5 


log inventory 


Figure 2. Scatter plot of log sales and log inventory data by strata 


data by strata on a log scale. We assumed the following superpopulation model, 

(8.1) log(yhi) = Poh + Pih log{xhi) + eu, 

where f3oh and I3ih are strata-specific parameters with h being the strata identi- 
her, and e^i ~ To assess the adequacy of model (8.1), we made some 

diagnostic plots. Figure 3 shows the residual plot and the normal Q-Q plot for the 
htted model (8.1). From the residual plot, the constant variance assumption of e^i 
appears to be reasonable. From the normal Q-Q plot, the normality assumption 
of ehi approximately holds. 

To create missing, we considered univariate missing where only y has missing 
values. We generated the response indicator S of y according to 

6 ~ Bernoulli{'K), vr = 1/[1 + exp{4 — 0.31og(x)}]. 

Under this model, the missing mechanism is MAR and the response rate is about 

0 . 6 . 

The parameters of interest are the stratum mean of y, rjh = fJ-h for 1 < /i < 5, 
and the population mean of y, ye = y. The true parameter values are yi = 92.25, 
r }2 = 67.90, r ]3 = 18.24, 774 = 13.01, rj^ = 5.92, and ye = 20.40. The estimation 
methods included (i) Full, the full sample estimator, which is used as a benchmark 
for comparison, (ii) MI, the multiple imputation estimator with imputation size 
M = 100, and {in) PFI, the parametric fractional imputation estimator with 
imputation size M = 100, where the model parameters are estimated by the 
pseudo MLE solving the score equation (4). 
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Figure 3. Regression model o/log(j/) against \og(x) and strata indicator 


To generate samples, we considered stratified sampling with simple random 
sampling within strata (STSRS) without replacement. Table 3 shows strata sizes 
Nh, sample sizes Uh, and sampling weights. The sampling weights range from 
12.57 to 45.79. The samples are generated 2,000 times. 


Table 3 

The sample allocation in stratified simple random sampling. 


Strata 

1 

2 

3 

4 

5 

Strata size Nh 

352 

566 

1963 

2181 

2198 

Sample size Uh 

28 

32 

46 

46 

48 

Sampling weight 

12.57 

17.69 

42.67 

47.41 

45.79 


For MI, we considered the imputation models in (8.1). Because the sampling 
design is stratified random sampling and the imputation model includes the stra¬ 
tum indicator function, the sampling design becomes noninformative. We first 
imputed log(y) from the posterior distribution of (8.1), given the observed data, 
and then transformed the imputations to the original scale of y. The implemen¬ 
tation of MI was carried out by the “mice” package in R. In each imputed data 
set, we applied the following full-sample point estimators and variance estimators; 

= N~^ NhVh with Vh being the sample mean of y in the /i-th stratum Sh, 

V{v) = Ef=i ^h(l - nhlNh)sllnh with sf = [uh - 1)"^ “ Vh?■ 

For PFI, we considered the imputation model in (8.1). The proposal distribution 
in the importance sampling step is the imputation distribution evaluated at ini¬ 
tial parameter values estimated from the available data. In PFI, for estimating 
model parameters, we obtained the pseudo MLEs by solving the score equations 
weighted by sampling weights, as in (4). After imputation, y was estimated by (5) 
by choosing U to be the corresponding estimating function. We used the delete-1 
Jackknife replication method for variance estimation, 

VRifi) = Yj 

h=l ^ i(^Sh 
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where is computed by omitting unit i € Sh and modifying the weights so that 
Whj is replaced by UhWhj/{nh — 1 ) for all j £ Sh and the weight remains the same 
for all other j. 

Table 4 shows the numerical results. The mean and variance are calculated 
as the Monte Carlo mean and variance of the point estimates across the sim¬ 
ulated sample data. The relative bias of the variance estimator is calculated as 
{{ve — var)/var} x 100%, where ve is the Monte Carlo mean of variance estimates 
and var is Monte Carlo variance of point estimates. In addition, 95% confidence 
intervals are calculated as {fj — zo,Q■J 5 ^/V,7] + zo,Q 75 ^/V), where zo .975 is the 97.5% 
quantile of the standard normal distribution. The three estimators are essentially 
unbiased for point estimation. The variances for PFI and MI are close for all pa¬ 
rameters. However, for inference, the validity of Rubin’s variance estimator relies 
on the congeniality condition (Meng 1994), which holds when MLEs are used 
as the full-sample estimator in MI, but not for Method-of-Moments estimators 
(MMEs) under MAR (Yang and Kim 2015b). As shown in Table 4, Rubin’s vari¬ 
ance estimator of the MI estimator is biased upward for strata means and the 
population mean with relative bias 48.06%, 30.53%, 23.05%, 23.02%, 16.96% for 
1 < J < 5 and 32.75% for [Imi- Under the log normal distribution and 
MAR, the MMEs are not self-efficient and Rubin’s variance estimator is biased, 
which is consistent with the results in Meng (1994) and Yang and Kim (2015b). 
Among those, Stratum 1 has largest bias of the variance estimator, followed by 
Stratum 2, given their smaller sample sizes compared to other strata. In addition, 
the mean width of confidence intervals is larger than that of FI. For the popu¬ 
lation mean, we used the Horvitz-Thompson (HT) estimator as the full-sample 
estimator instead of the MLE under log-normal distribution. It is well-known 
that the HT estimator is robust but inefficient, which results in bias in Rubin’s 
variance estimator. The coverage of 95% confidence interval reaches 98.3% for 
the population mean due to variance overestimation. In contrast, PFI variance 
estimators applied to the HT estimator are essentially unbiased and provides 
empirical coverages close to the nominal coverage. 

9. CONCLUDING REMARKS 

In survey sampling, MI and FI are two available approaches of imputation 
for general-purpose estimation. In MI, Rubin’s variance estimation formula is 
recommended because of its simplicity, but it requires the congeniality condition 
of Meng (1994), which can be restrictive in practice. A merit of FI is that the 
congeniality condition is not needed for consistent variance estimation. When 
the sampling design is informative, MI can use an augmented model to make the 
sampling design noninformative. However, incorporating all design information 
into the model is not always possible (Reiter et al. 2006) and valid inference 
under MI is not easy or sometimes impossible (Berg et al. 2015). In contrast, FI 
can handle informative sampling design easily as it incorporates sampling weights 
into estimation instead of modeling. 

So far, we have presented FI under the MAR case. Parametric FI can be 
adapted to a situation, where the missing values are suspected to be missing not 
at random (MNAR) (Kim and Kim 2012; Yang et al. 2013). A semiparametric 
FI using the exponential tilting model of Kim and Yu (2011b) is also promis¬ 
ing, which is under development. Also, FI can be used to approximate observed 
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log likelihood easily (Yang and Kim 2015a). The approximation of the observed 
log likelihood can be directly applied to model selections or model comparisons 
with missing data, such as using Akaike Information Criterion or the Bayesian 
Information Criterion. Further investigation on this topic will be worthwhile. 

We conclude the paper with the hope that continuing efforts will be made into 
developing statistical methods and corresponding computational programs (an R 
software package is in progress) for FI, so as to make these methods accessible to 
a broader audience. 
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